Spark Shell

Spark comes with interactive shells that enable ad hoc data analysis. Spark’s shells allow you to interact with data that is distributed on disk or in memory across many machines, and Spark takes care of automatically distributing this processing. Because Spark can load data into memory on the worker nodes, many distributed computations, even ones that process terabytes of data across dozens of machines, can run in a few seconds. This makes the sort of iterative, ad hoc, and exploratory analysis commonly done in shells a good fit for Spark. Spark provides both Python and Scala shells.

In this case, I have HDP 2.5 Sandbox VM installed on a standalone machine. Here is how I start the Spark Scala Shell

results matching ""

    No results matching ""